flowchart LR
A[Create DGM] --> B[Simulate Trials]
B --> C[Run Analyses]
C --> D[Summarize Results]
Simulation Studies for Evaluating ForestSearch Performance
Operating Characteristics and Power Analysis
1 Introduction
This vignette demonstrates how to conduct simulation studies to evaluate the performance of ForestSearch for identifying subgroups with differential treatment effects. The simulation framework allows you to:
- Generate synthetic clinical trial data with known treatment effect heterogeneity
- Evaluate subgroup identification rates (power)
- Assess classification accuracy (sensitivity, specificity, PPV, NPV)
- Compare different analysis methods (ForestSearch, GRF)
- Estimate Type I error under null hypothesis
1.1 Simulation Framework Overview
The simulation workflow consists of four main steps:
- Create DGM: Define a data generating mechanism with specified treatment effects
- Simulate Trials: Generate multiple simulated datasets
- Run Analyses: Apply ForestSearch (and optionally GRF) to each dataset
- Summarize Results: Aggregate operating characteristics across simulations
1.2 Key Updates in This Version
The simulation framework has been aligned with generate_aft_dgm_flex() methodology:
| Feature | Description |
|---|---|
| Individual Potential Outcomes | theta_0, theta_1, loghr_po columns |
| Average Hazard Ratios (AHR) | Alternative to Cox-based HR estimation |
| Stacked PO for Cox HR | Same epsilon for causal inference |
use_twostage Parameter |
Faster exploratory analysis option |
| Backward Compatible | Works with old and new DGM formats |
2 Setup
# Core packages
library(forestsearch)
library(weightedsurv)
library(data.table)
library(survival)
library(ggplot2)
library(gt)
# Parallel processing
library(foreach)
library(doFuture)
library(future)
# Source simulation functions (if not yet in package)
# source("sim_aft_gbsg_refactored.R")
# source("oc_analyses_gbsg.R")3 Creating a Data Generating Mechanism
The simulation framework uses the German Breast Cancer Study Group (GBSG) dataset as a template for realistic covariate distributions and censoring patterns.
3.1 Understanding the DGM
The create_gbsg_dgm() function creates a data generating mechanism (DGM) based on an Accelerated Failure Time (AFT) model with Weibull distribution. Key features:
- Covariates: Age, estrogen receptor, menopausal status, progesterone receptor, nodes
- Treatment effect heterogeneity: Specified via interaction terms
- Subgroup definition: H = {low estrogen receptor AND premenopausal}
- Censoring: Weibull or uniform censoring model
3.1.1 New Output Structure (Aligned with generate_aft_dgm_flex)
The DGM now includes:
dgm$hazard_ratios <- list(
overall = hr_causal, # Cox-based overall HR
AHR = AHR, # Average HR from loghr_po
AHR_harm = AHR_H_true, # AHR in harm subgroup
AHR_no_harm = AHR_Hc_true, # AHR in complement
harm_subgroup = hr_H_true, # Cox-based HR in H
no_harm_subgroup = hr_Hc_true # Cox-based HR in Hc
)
The super-population data (dgm$df_super_rand) now contains:
| Column | Description |
|---|---|
theta_0 |
Log-hazard contribution under control |
theta_1 |
Log-hazard contribution under treatment |
loghr_po |
Individual causal log hazard ratio (theta_1 - theta_0) |
3.2 Alternative Hypothesis (Heterogeneous Treatment Effect)
Under the alternative hypothesis, we create a DGM where the treatment effect varies across patient subgroups:
# Create DGM with heterogeneous treatment effect
# HR in harm subgroup (H) will be > 1 (treatment harmful)
# HR in complement (H^c) will be < 1 (treatment beneficial)
dgm_alt <- create_gbsg_dgm(
model = "alt",
k_treat = 1.0,
k_inter = 2.0, # Interaction effect multiplier
k_z3 = 1.0,
z1_quantile = 0.25, # ER threshold at 25th percentile
n_super = 5000,
cens_type = "weibull",
seed = 8316951,
verbose = TRUE
)
# Examine the DGM (print method now shows both HR and AHR metrics)
print(dgm_alt)GBSG-Based AFT Data Generating Mechanism (Aligned)
===================================================
Model type: alt
Super-population size: 5000
Effect Modifiers:
k_treat: 1
k_inter: 2
k_z3: 1
Hazard Ratios (Cox-based, stacked PO):
Overall (causal): 0.7331
Harm subgroup (H): 2.9651
Complement (Hc): 0.6612
Ratio HR(H)/HR(Hc): 4.4846
Average Hazard Ratios (from loghr_po):
AHR (overall): 0.7431
AHR_harm (H): 3.8687
AHR_no_harm (Hc): 0.5848
Ratio AHR(H)/AHR(Hc): 6.6157
Subgroup definition: z1 == 1 & z3 == 1 (low ER & premenopausal)
ER threshold: 8 (quantile = 0.25)
Subgroup size: 634 (12.7%)
Analysis variables: v1, v2, v3, v4, v5, v6, v7
3.2.1 Accessing Hazard Ratios (New Aligned Format)
# Traditional access (backward compatible)
cat("Cox-based HRs:\n")Cox-based HRs:
cat(" HR(H):", round(dgm_alt$hr_H_true, 4), "\n") HR(H): 2.9651
cat(" HR(Hc):", round(dgm_alt$hr_Hc_true, 4), "\n") HR(Hc): 0.6612
cat(" HR(overall):", round(dgm_alt$hr_causal, 4), "\n") HR(overall): 0.7331
# New AHR metrics (aligned with generate_aft_dgm_flex)
cat("\nAverage Hazard Ratios (from loghr_po):\n")
Average Hazard Ratios (from loghr_po):
cat(" AHR(H):", round(dgm_alt$AHR_H_true, 4), "\n") AHR(H): 3.8687
cat(" AHR(Hc):", round(dgm_alt$AHR_Hc_true, 4), "\n") AHR(Hc): 0.5848
cat(" AHR(overall):", round(dgm_alt$AHR, 4), "\n") AHR(overall): 0.7431
# Using hazard_ratios list (unified access)
cat("\nVia hazard_ratios list:\n")
Via hazard_ratios list:
cat(" harm_subgroup:", round(dgm_alt$hazard_ratios$harm_subgroup, 4), "\n") harm_subgroup: 2.9651
cat(" AHR_harm:", round(dgm_alt$hazard_ratios$AHR_harm, 4), "\n") AHR_harm: 3.8687
3.2.2 Examining Individual-Level Treatment Effects
# The super-population now includes individual log hazard ratios
df_super <- dgm_alt$df_super_rand
cat("Individual-level potential outcomes:\n")Individual-level potential outcomes:
cat(" theta_0 (control log-hazard) range:",
round(range(df_super$theta_0), 3), "\n") theta_0 (control log-hazard) range: -0.891 1.76
cat(" theta_1 (treatment log-hazard) range:",
round(range(df_super$theta_1), 3), "\n") theta_1 (treatment log-hazard) range: -1.427 2.909
cat(" loghr_po (individual log-HR) range:",
round(range(df_super$loghr_po), 3), "\n") loghr_po (individual log-HR) range: -0.537 1.353
# Verify AHR calculation
ahr_manual <- exp(mean(df_super$loghr_po))
cat("\nAHR verification:\n")
AHR verification:
cat(" exp(mean(loghr_po)) =", round(ahr_manual, 4), "\n") exp(mean(loghr_po)) = 0.7431
cat(" dgm$AHR =", round(dgm_alt$AHR, 4), "\n") dgm$AHR = 0.7431
# Distribution of individual treatment effects
cat("\nIndividual HR distribution:\n")
Individual HR distribution:
individual_hr <- exp(df_super$loghr_po)
cat(" Mean:", round(mean(individual_hr), 4), "\n") Mean: 1.0012
cat(" Median:", round(median(individual_hr), 4), "\n") Median: 0.5848
cat(" SD:", round(sd(individual_hr), 4), "\n") SD: 1.0928
3.3 Calibrating for a Target Hazard Ratio
Often, you want to specify the exact hazard ratio in the harm subgroup. Use calibrate_k_inter() to find the interaction parameter that achieves this.
3.3.1 Calibrate to Cox-based HR (Default)
# Find k_inter for Cox-based HR = 2.0 in harm subgroup
k_inter_cox <- calibrate_k_inter(
target_hr_harm = 2.0,
model = "alt",
k_treat = 0.9,
cens_type = "weibull",
use_ahr = FALSE, # Default: calibrate to Cox-based HR
verbose = TRUE
)
# Create DGM with calibrated k_inter
dgm_calibrated_cox <- create_gbsg_dgm(
model = "alt",
k_treat = 0.9,
k_inter = k_inter_cox,
verbose = TRUE
)
cat("\nVerification (Cox-based):\n")
Verification (Cox-based):
cat("Achieved HR(H):", round(dgm_calibrated_cox$hr_H_true, 3), "\n")Achieved HR(H): 2
cat("HR(H^c):", round(dgm_calibrated_cox$hr_Hc_true, 3), "\n")HR(H^c): 0.689
cat("Overall HR:", round(dgm_calibrated_cox$hr_causal, 3), "\n")Overall HR: 0.749
3.3.2 Calibrate to AHR (New Option)
# Alternatively, calibrate to Average Hazard Ratio
k_inter_ahr <- calibrate_k_inter(
target_hr_harm = 2.0,
model = "alt",
k_treat = 0.9,
cens_type = "weibull",
use_ahr = TRUE, # NEW: calibrate to AHR instead
verbose = TRUE
)
# Create DGM with AHR-calibrated k_inter
dgm_calibrated_ahr <- create_gbsg_dgm(
model = "alt",
k_treat = 0.9,
k_inter = k_inter_ahr,
verbose = TRUE
)
cat("\nVerification (AHR-based):\n")
Verification (AHR-based):
cat("Achieved AHR(H):", round(dgm_calibrated_ahr$AHR_H_true, 3), "\n")Achieved AHR(H): 2
cat("AHR(H^c):", round(dgm_calibrated_ahr$AHR_Hc_true, 3), "\n")AHR(H^c): 0.617
cat("Overall AHR:", round(dgm_calibrated_ahr$AHR, 3), "\n")Overall AHR: 0.716
3.3.3 Compare Cox HR vs AHR Calibration
# Compare the two calibration approaches
cat("Comparison of calibration methods:\n")Comparison of calibration methods:
cat(sprintf("%-20s %-12s %-12s\n", "Metric", "Cox-calib", "AHR-calib"))Metric Cox-calib AHR-calib
cat(sprintf("%-20s %-12.4f %-12.4f\n", "k_inter", k_inter_cox, k_inter_ahr))k_inter 1.4379 1.2448
cat(sprintf("%-20s %-12.4f %-12.4f\n", "HR(H)",
dgm_calibrated_cox$hr_H_true, dgm_calibrated_ahr$hr_H_true))HR(H) 2.0000 1.7233
cat(sprintf("%-20s %-12.4f %-12.4f\n", "AHR(H)",
dgm_calibrated_cox$AHR_H_true, dgm_calibrated_ahr$AHR_H_true))AHR(H) 2.4001 2.0000
3.4 Validating k_inter Effect on Heterogeneity
Use validate_k_inter_effect() to verify the interaction parameter properly modulates treatment effect heterogeneity:
# Test k_inter effect on HR heterogeneity
# k_inter = 0 should give ratio ≈ 1 (no heterogeneity)
validation_results <- validate_k_inter_effect(
k_inter_values = c(-2, -1, 0, 1, 2, 3),
verbose = TRUE
)Testing k_inter effect on HR heterogeneity...
k_inter HR(H) HR(Hc) AHR(H) AHR(Hc) Ratio(Cox) Ratio(AHR)
----------------------------------------------------------------------
-2.0 0.1336 0.6612 0.0884 0.5848 0.2021 0.1512
-1.0 0.3033 0.6612 0.2274 0.5848 0.4587 0.3888
0.0 0.6552 0.6612 0.5848 0.5848 0.9909 1.0000
1.0 1.3873 0.6612 1.5041 0.5848 2.0982 2.5721
2.0 2.9651 0.6612 3.8687 0.5848 4.4846 6.6157
3.0 6.6375 0.6612 9.9507 0.5848 10.0387 17.0162
PASS: k_inter = 0 gives Cox ratio ~= 1 (no heterogeneity)
PASS: k_inter = 0 gives AHR ratio ~= 1 (no heterogeneity)
AHR vs Cox HR alignment:
k_inter = -2.0: HR(H) vs AHR(H) diff = 0.0452
k_inter = -1.0: HR(H) vs AHR(H) diff = 0.0759
k_inter = 0.0: HR(H) vs AHR(H) diff = 0.0704
k_inter = 1.0: HR(H) vs AHR(H) diff = 0.1168
k_inter = 2.0: HR(H) vs AHR(H) diff = 0.9036
k_inter = 3.0: HR(H) vs AHR(H) diff = 3.3132
3.5 Null Hypothesis (Uniform Treatment Effect)
For Type I error evaluation, create a DGM with uniform treatment effect:
# Create null DGM (no treatment effect heterogeneity)
dgm_null <- create_gbsg_dgm(
model = "null",
k_treat = 0.9,
verbose = TRUE
)
cat("\nNull hypothesis HRs:\n")
Null hypothesis HRs:
cat("Overall HR:", round(dgm_null$hr_causal, 3), "\n")Overall HR: 0.746
cat("HR(H^c):", round(dgm_null$hr_Hc_true, 3), "\n")HR(H^c): 0.746
cat("AHR(H^c):", round(dgm_null$AHR_Hc_true, 3), "\n")AHR(H^c): 0.683
cat("AHR:", round(dgm_null$AHR, 3), "\n")AHR: 0.683
4 Simulating Trial Data
4.1 Single Trial Simulation
Use simulate_from_gbsg_dgm() to generate a single simulated trial:
# Use the Cox-calibrated DGM for simulations
dgm_calibrated <- dgm_calibrated_ahr
# Simulate a single trial
sim_data <- simulate_from_gbsg_dgm(
dgm = dgm_calibrated,
n = 700,
rand_ratio = 1, # 1:1 randomization
sim_id = 1,
max_follow = 84, # 84 months administrative censoring
muC_adj = log(1.5) # Censoring adjustment
)
# Examine the data
cat("Simulated trial:\n")Simulated trial:
cat(" N =", nrow(sim_data), "\n") N = 700
cat(" Events =", sum(sim_data$event.sim),
"(", round(100 * mean(sim_data$event.sim), 1), "%)\n") Events = 381 ( 54.4 %)
cat(" Harm subgroup size =", sum(sim_data$flag.harm),
"(", round(100 * mean(sim_data$flag.harm), 1), "%)\n") Harm subgroup size = 86 ( 12.3 %)
# Quick survival analysis
fit_itt <- coxph(Surv(y.sim, event.sim) ~ treat, data = sim_data)
cat(" Estimated ITT HR =", round(exp(coef(fit_itt)), 3), "\n") Estimated ITT HR = 0.767
4.1.1 Examining Individual-Level Effects in Simulated Data
# The simulated data now includes loghr_po
if ("loghr_po" %in% names(sim_data)) {
cat("\nIndividual treatment effects in simulated trial:\n")
# Compute AHR in simulated data by subgroup
ahr_H_sim <- exp(mean(sim_data$loghr_po[sim_data$flag.harm == 1]))
ahr_Hc_sim <- exp(mean(sim_data$loghr_po[sim_data$flag.harm == 0]))
ahr_overall_sim <- exp(mean(sim_data$loghr_po))
cat(" AHR(H) in sim:", round(ahr_H_sim, 3), "\n")
cat(" AHR(Hc) in sim:", round(ahr_Hc_sim, 3), "\n")
cat(" AHR(overall) in sim:", round(ahr_overall_sim, 3), "\n")
} else {
cat("\nNote: loghr_po not available in simulated data\n")
}
Individual treatment effects in simulated trial:
AHR(H) in sim: 2
AHR(Hc) in sim: 0.617
AHR(overall) in sim: 0.713
4.2 Examining Covariate Structure
dfcount <- df_counting(
df = sim_data,
by.risk = 6,
tte.name = "y.sim",
event.name = "event.sim",
treat.name = "treat"
)
plot_weighted_km(dfcount, conf.int = TRUE, show.logrank = TRUE,
ymax = 1.05, xmed.fraction = 0.775, ymed.offset = 0.125)create_summary_table(
data = sim_data,
treat_var = "treat",
table_title = "Characteristics by Treatment Arm",
vars_continuous = c("z1", "z2", "size", "z3", "z4", "z5"),
vars_categorical = c("flag.harm", "grade3"),
font_size = 12
)| Characteristics by Treatment Arm | |||||
| Characteristic | Control (n=350) | Treatment (n=350) | P-value1 | SMD2 | |
|---|---|---|---|---|---|
| z1 | Mean (SD) | 0.3 (0.4) | 0.2 (0.4) | 0.729 | 0.03 |
| z2 | Mean (SD) | 2.4 (1.1) | 2.5 (1.1) | 0.157 | 0.11 |
| size | Mean (SD) | 29.2 (14.3) | 30.1 (17.0) | 0.461 | 0.06 |
| z3 | Mean (SD) | 0.4 (0.5) | 0.4 (0.5) | 0.249 | 0.09 |
| z4 | Mean (SD) | 2.6 (1.1) | 2.5 (1.1) | 0.428 | 0.06 |
| z5 | Mean (SD) | 2.4 (1.1) | 2.4 (1.1) | 0.563 | 0.04 |
| flag.harm | 0.908 | 0.00 | |||
| 0 | 306 (87.4%) | 308 (88.0%) | |||
| 1 | 44 (12.6%) | 42 (12.0%) | |||
| grade3 | 0.530 | 0.02 | |||
| 0 | 273 (78.0%) | 265 (75.7%) | |||
| 1 | 77 (22.0%) | 85 (24.3%) | |||
| 1 P-values: t-test for continuous, chi-square/Fisher's exact for categorical/binary variables | |||||
| 2 SMD = Standardized mean difference (Cohen's d for continuous, Cramer's V for categorical) | |||||
5 Running Simulation Studies
5.1 Setting Up Parallel Processing
For efficient simulation studies, use parallel processing:
# Configure parallel backend
n_workers <- min(parallel::detectCores() - 1, 120)
plan(sequential, workers = n_workers)
registerDoFuture()
cat("Using", n_workers, "parallel workers\n")Using 13 parallel workers
5.2 Define Simulation Parameters
# Simulation settings
sim_config_alt <- list(
n_sims = 20, # Number of simulations (use 500-1000 for final)
n_sample = 700, # Sample size per trial
max_follow = 84, # Maximum follow-up (months)
seed_base = 8316951,
muC_adj = log(1.5)
)
sim_config_null <- list(
n_sims = 20, # More simulations for Type I error estimation
n_sample = 700, # Sample size per trial
max_follow = 84, # Maximum follow-up (months)
seed_base = 8316951,
muC_adj = log(1.5)
)
# ForestSearch parameters (now includes use_twostage)
fs_params <- list(
outcome.name = "y.sim",
event.name = "event.sim",
treat.name = "treat",
id.name = "id",
use_lasso = FALSE,
use_grf = TRUE,
hr.threshold = 1.25,
hr.consistency = 1.0,
pconsistency.threshold = 0.90,
fs.splits = 400,
n.min = 60,
d0.min = 10,
d1.min = 10,
maxk = 2,
max.minutes = 5,
by.risk = 12,
vi.grf.min = -0.2,
# NEW: Two-stage algorithm option
use_twostage = TRUE, # Set TRUE for faster exploratory analysis
twostage_args = list() # Optional tuning parameters
)
# Confounders (use v1-v7 factor names, matching DGM analysis_vars)
#confounders_base <- c("v1", "v2", "v3", "v4", "v5", "v6", "v7")
# Confounders for analysis
confounders_base <- c("z1", "z2", "z3", "z4", "z5", "size", "grade3")5.2.1 Two-Stage Algorithm Option
The use_twostage parameter enables a faster two-stage search algorithm:
# Fast configuration with two-stage algorithm
fs_params_fast <- modifyList(fs_params, list(
use_twostage = TRUE,
twostage_args = list(
n.splits.screen = 30, # Stage 1 screening splits
batch.size = 20, # Stage 2 batch size
conf.level = 0.95 # Early stopping confidence
)
))
cat("Standard search: use_twostage =", fs_params$use_twostage, "\n")Standard search: use_twostage = TRUE
cat("Fast search: use_twostage =", fs_params_fast$use_twostage, "\n")Fast search: use_twostage = TRUE
5.3 Running Alternative Hypothesis Simulations
cat("Running", sim_config_alt$n_sims, "simulations under H1...\n")Running 20 simulations under H1...
start_time <- Sys.time()
results_alt <- foreach(
sim = 1:sim_config_alt$n_sims,
.combine = rbind,
.errorhandling = "remove",
.options.future = list(
packages = c("forestsearch", "survival", "data.table"),
seed = TRUE
)
) %dofuture% {
run_simulation_analysis(
sim_id = sim,
dgm = dgm_calibrated,
n_sample = sim_config_alt$n_sample,
max_follow = sim_config_alt$max_follow,
muC_adj = sim_config_alt$muC_adj,
confounders_base = confounders_base,
cox_formula_adj = survival::Surv(y.sim, event.sim) ~ treat + z1 + z2 + z3,
n_add_noise = 0L,
run_fs = TRUE,
run_fs_grf = FALSE,
run_grf = TRUE,
fs_params = fs_params,
verbose = TRUE,
debug = TRUE,
verbose_n = 3 # Only print first 3 simulations
)
}
=== Two-Stage Consistency Evaluation Enabled ===
Stage 1 screening splits: 30
Maximum total splits: 400
Batch size: 20
================================================
GRF stage for cut selection with dmin, tau = 12 0.6
tau, maxdepth = 48.47651 2
leaf.node control.mean control.size control.se depth
2 3 -3.74 692.00 1.03 1
1 4 -6.37 426.00 1.22 2
21 5 2.55 142.00 2.60 2
4 7 -6.73 78.00 3.15 2
GRF subgroup NOT found
GRF cuts identified: 0
# of continuous/categorical characteristics 4 3
Continuous characteristics: z2 z4 z5 size
Categorical characteristics: z1 z3 grade3
Default cuts included (1st 20) z2 <= mean(z2) z2 <= median(z2) z2 <= qlow(z2) z2 <= qhigh(z2) z4 <= mean(z4) z4 <= median(z4) z4 <= qlow(z4) z4 <= qhigh(z4) z5 <= mean(z5) z5 <= median(z5) z5 <= qlow(z5) z5 <= qhigh(z5) size <= mean(size) size <= median(size) size <= qlow(size) size <= qhigh(size)
Categorical: z1 z3 grade3
===== CONSOLIDATED CUT EVALUATION (IMPROVED) =====
Evaluating 17 cut expressions once and caching...
Cut evaluation summary:
Total cuts: 17
Valid cuts: 16
Errors: 0
Dropping variables (cut only has 1 level): z2 <= 4
Total cuts after dropping invalid: 16
✓ All 16 factors validated as 0/1
===== END CONSOLIDATED CUT EVALUATION =====
# of candidate subgroup factors= 16
[1] "z2 <= 2.5" "z2 <= 2" "z4 <= 2.5" "z4 <= 3" "z4 <= 2"
[6] "z5 <= 2.4" "z5 <= 2" "z5 <= 1" "z5 <= 3" "size <= 29.1"
[11] "size <= 25" "size <= 20" "size <= 35" "z1" "z3"
[16] "grade3"
Number of possible configurations (<= maxk): maxk = 2 , # combinations = 528
Events criteria: control >= 10 , treatment >= 10
Subgroup search completed in 0.02 minutes
Found 7 subgroup candidate(s)
# of candidate subgroups (meeting all criteria) = 7
Random seed set to: 8316951
Removed 2 near-duplicate subgroups
Original rows: 7
After removal: 5
# of unique initial candidates: 5
# Restricting to top stop_Kgroups = 10
# of candidates to evaluate: 5
Algorithm: Two-stage sequential
Stage 1 splits: 30
Screen threshold: 0.763
Max total splits: 400
Batch size: 20
Parallel processing: callr with 6 workers
All subgroup evaluations returned NULL
Minutes forestsearch overall = 0.05
Consistency algorithm used: twostage
tau, maxdepth = 48.47651 2
leaf.node control.mean control.size control.se depth
2 3 -3.74 692.00 1.03 1
1 4 -6.37 426.00 1.22 2
21 5 2.55 142.00 2.60 2
4 7 -6.73 78.00 3.15 2
GRF subgroup NOT found
=== Two-Stage Consistency Evaluation Enabled ===
Stage 1 screening splits: 30
Maximum total splits: 400
Batch size: 20
================================================
GRF stage for cut selection with dmin, tau = 12 0.6
tau, maxdepth = 49.03326 2
leaf.node control.mean control.size control.se depth
1 2 -5.22 534.00 1.19 1
2 3 1.81 166.00 2.28 1
3 4 -4.80 262.00 1.64 2
4 5 8.82 96.00 2.85 2
5 6 -6.63 336.00 1.53 2
Selected subgroup:
leaf.node control.mean control.size control.se depth
4 5 8.82 96.00 2.85 2
GRF subgroup found
Terminating node at max.diff (sg.harm.id):
[1] "z1 <= 0"
All splits:
[1] "z1 <= 0" "z2 <= 2" "size <= 58"
GRF cuts identified: 3
Cuts: z1 <= 0, z2 <= 2, size <= 58
# of continuous/categorical characteristics 4 3
Continuous characteristics: z2 z4 z5 size
Categorical characteristics: z1 z3 grade3
Default cuts included (1st 20) z2 <= mean(z2) z2 <= median(z2) z2 <= qlow(z2) z2 <= qhigh(z2) z4 <= mean(z4) z4 <= median(z4) z4 <= qlow(z4) z4 <= qhigh(z4) z5 <= mean(z5) z5 <= median(z5) z5 <= qlow(z5) z5 <= qhigh(z5) size <= mean(size) size <= median(size) size <= qlow(size) size <= qhigh(size)
Categorical: z1 z3 grade3
Factors per GRF: z1 <= 0 z2 <= 2 size <= 58
Initial GRF cuts included z1 <= 0 z2 <= 2 size <= 58
===== CONSOLIDATED CUT EVALUATION (IMPROVED) =====
Evaluating 18 cut expressions once and caching...
Cut evaluation summary:
Total cuts: 18
Valid cuts: 18
Errors: 0
✓ All 18 factors validated as 0/1
===== END CONSOLIDATED CUT EVALUATION =====
# of candidate subgroup factors= 18
[1] "z2 <= 2" "size <= 58" "z2 <= 2.5" "z2 <= 3" "z4 <= 2.5"
[6] "z4 <= 3" "z4 <= 2" "z5 <= 2.5" "z5 <= 2" "z5 <= 1.8"
[11] "z5 <= 3" "size <= 29.4" "size <= 25" "size <= 20" "size <= 35"
[16] "z1" "z3" "grade3"
Number of possible configurations (<= maxk): maxk = 2 , # combinations = 666
Events criteria: control >= 10 , treatment >= 10
Subgroup search completed in 0.02 minutes
Found 22 subgroup candidate(s)
# of candidate subgroups (meeting all criteria) = 22
Random seed set to: 8316951
Removed 10 near-duplicate subgroups
Original rows: 22
After removal: 12
# of unique initial candidates: 12
# Restricting to top stop_Kgroups = 10
# of candidates to evaluate: 10
Algorithm: Two-stage sequential
Stage 1 splits: 30
Screen threshold: 0.763
Max total splits: 400
Batch size: 20
Parallel processing: callr with 6 workers
SG focus= hr
Subgroup Consistency Minutes= 0.029
Algorithm used: Two-stage sequential
Candidates evaluated: 10
Candidates passed: 4
Subgroup found (FS) with sg_focus='hr'
Selected subgroup: {z3} & {z1}
Minutes forestsearch overall = 0.05
Consistency algorithm used: twostage
tau, maxdepth = 49.03326 2
leaf.node control.mean control.size control.se depth
1 2 -5.22 534.00 1.19 1
2 3 1.81 166.00 2.28 1
3 4 -4.80 262.00 1.64 2
4 5 8.82 96.00 2.85 2
5 6 -6.63 336.00 1.53 2
Selected subgroup:
leaf.node control.mean control.size control.se depth
4 5 8.82 96.00 2.85 2
GRF subgroup found
Terminating node at max.diff (sg.harm.id):
[1] "z1 <= 0"
All splits:
[1] "z1 <= 0" "z2 <= 2" "size <= 58"
=== Two-Stage Consistency Evaluation Enabled ===
Stage 1 screening splits: 30
Maximum total splits: 400
Batch size: 20
================================================
GRF stage for cut selection with dmin, tau = 12 0.6
tau, maxdepth = 49.91866 2
leaf.node control.mean control.size control.se depth
1 2 -5.91 494.00 1.23 1
2 3 4.62 206.00 2.12 1
21 5 -6.34 123.00 2.10 2
3 6 -6.56 385.00 1.49 2
4 7 6.32 143.00 2.58 2
Selected subgroup:
leaf.node control.mean control.size control.se depth
4 7 6.32 143.00 2.58 2
GRF subgroup found
Terminating node at max.diff (sg.harm.id):
[1] "z1 <= 0"
All splits:
[1] "z1 <= 0" "z5 <= 1" "size <= 18"
GRF cuts identified: 3
Cuts: z1 <= 0, z5 <= 1, size <= 18
# of continuous/categorical characteristics 4 3
Continuous characteristics: z2 z4 z5 size
Categorical characteristics: z1 z3 grade3
Default cuts included (1st 20) z2 <= mean(z2) z2 <= median(z2) z2 <= qlow(z2) z2 <= qhigh(z2) z4 <= mean(z4) z4 <= median(z4) z4 <= qlow(z4) z4 <= qhigh(z4) z5 <= mean(z5) z5 <= median(z5) z5 <= qlow(z5) z5 <= qhigh(z5) size <= mean(size) size <= median(size) size <= qlow(size) size <= qhigh(size)
Categorical: z1 z3 grade3
Factors per GRF: z1 <= 0 z5 <= 1 size <= 18
Initial GRF cuts included z1 <= 0 z5 <= 1 size <= 18
===== CONSOLIDATED CUT EVALUATION (IMPROVED) =====
Evaluating 20 cut expressions once and caching...
Cut evaluation summary:
Total cuts: 20
Valid cuts: 19
Errors: 0
Dropping variables (cut only has 1 level): z4 <= 4
Total cuts after dropping invalid: 19
✓ All 19 factors validated as 0/1
===== END CONSOLIDATED CUT EVALUATION =====
# of candidate subgroup factors= 19
[1] "z5 <= 1" "size <= 18" "z2 <= 2.4" "z2 <= 2" "z2 <= 1"
[6] "z2 <= 3" "z4 <= 2.5" "z4 <= 2" "z4 <= 1" "z5 <= 2.5"
[11] "z5 <= 2" "z5 <= 3" "size <= 29.3" "size <= 26" "size <= 20"
[16] "size <= 35" "z1" "z3" "grade3"
Number of possible configurations (<= maxk): maxk = 2 , # combinations = 741
Events criteria: control >= 10 , treatment >= 10
Subgroup search completed in 0.02 minutes
Found 34 subgroup candidate(s)
# of candidate subgroups (meeting all criteria) = 34
Random seed set to: 8316951
Removed 10 near-duplicate subgroups
Original rows: 34
After removal: 24
# of unique initial candidates: 24
# Restricting to top stop_Kgroups = 10
# of candidates to evaluate: 10
Algorithm: Two-stage sequential
Stage 1 splits: 30
Screen threshold: 0.763
Max total splits: 400
Batch size: 20
Parallel processing: callr with 6 workers
SG focus= hr
Subgroup Consistency Minutes= 0.037
Algorithm used: Two-stage sequential
Candidates evaluated: 10
Candidates passed: 7
Subgroup found (FS) with sg_focus='hr'
Selected subgroup: {z1} & {z2 <= 1}
Minutes forestsearch overall = 0.06
Consistency algorithm used: twostage
tau, maxdepth = 49.91866 2
leaf.node control.mean control.size control.se depth
1 2 -5.91 494.00 1.23 1
2 3 4.62 206.00 2.12 1
21 5 -6.34 123.00 2.10 2
3 6 -6.56 385.00 1.49 2
4 7 6.32 143.00 2.58 2
Selected subgroup:
leaf.node control.mean control.size control.se depth
4 7 6.32 143.00 2.58 2
GRF subgroup found
Terminating node at max.diff (sg.harm.id):
[1] "z1 <= 0"
All splits:
[1] "z1 <= 0" "z5 <= 1" "size <= 18"
runtime_alt <- difftime(Sys.time(), start_time, units = "mins")
cat("Completed in", round(runtime_alt, 1), "minutes\n")Completed in 1.1 minutes
cat("Results:", nrow(results_alt), "rows\n")Results: 40 rows
5.4 Running Null Hypothesis Simulations
cat("Running", sim_config_null$n_sims, "simulations under H0...\n")Running 20 simulations under H0...
start_time <- Sys.time()
results_null <- foreach(
sim = 1:sim_config_null$n_sims,
.combine = rbind,
.errorhandling = "remove",
.options.future = list(
packages = c("forestsearch", "survival", "data.table"),
seed = TRUE
)
) %dofuture% {
run_simulation_analysis(
sim_id = sim,
dgm = dgm_null,
n_sample = sim_config_null$n_sample,
max_follow = sim_config_null$max_follow,
muC_adj = sim_config_null$muC_adj,
confounders_base = confounders_base,
cox_formula_adj = survival::Surv(y.sim, event.sim) ~ treat + z1 + z2 + z3,
n_add_noise = 0L,
run_fs = TRUE,
run_fs_grf = FALSE,
run_grf = TRUE,
fs_params = fs_params,
verbose = TRUE,
verbose_n = 3 # Only print first 3 simulations
)
}
=== Two-Stage Consistency Evaluation Enabled ===
Stage 1 screening splits: 30
Maximum total splits: 400
Batch size: 20
================================================
GRF stage for cut selection with dmin, tau = 12 0.6
tau, maxdepth = 47.91247 2
leaf.node control.mean control.size control.se depth
2 3 -4.44 695.00 1.00 1
1 4 -4.77 568.00 1.11 2
GRF subgroup NOT found
GRF cuts identified: 0
# of continuous/categorical characteristics 4 3
Continuous characteristics: z2 z4 z5 size
Categorical characteristics: z1 z3 grade3
Default cuts included (1st 20) z2 <= mean(z2) z2 <= median(z2) z2 <= qlow(z2) z2 <= qhigh(z2) z4 <= mean(z4) z4 <= median(z4) z4 <= qlow(z4) z4 <= qhigh(z4) z5 <= mean(z5) z5 <= median(z5) z5 <= qlow(z5) z5 <= qhigh(z5) size <= mean(size) size <= median(size) size <= qlow(size) size <= qhigh(size)
Categorical: z1 z3 grade3
===== CONSOLIDATED CUT EVALUATION (IMPROVED) =====
Evaluating 17 cut expressions once and caching...
Cut evaluation summary:
Total cuts: 17
Valid cuts: 16
Errors: 0
Dropping variables (cut only has 1 level): z2 <= 4
Total cuts after dropping invalid: 16
✓ All 16 factors validated as 0/1
===== END CONSOLIDATED CUT EVALUATION =====
# of candidate subgroup factors= 16
[1] "z2 <= 2.5" "z2 <= 2" "z4 <= 2.5" "z4 <= 3" "z4 <= 2"
[6] "z5 <= 2.4" "z5 <= 2" "z5 <= 1" "z5 <= 3" "size <= 29.1"
[11] "size <= 25" "size <= 20" "size <= 35" "z1" "z3"
[16] "grade3"
Number of possible configurations (<= maxk): maxk = 2 , # combinations = 528
Events criteria: control >= 10 , treatment >= 10
Subgroup search completed in 0.01 minutes
Found 5 subgroup candidate(s)
# of candidate subgroups (meeting all criteria) = 5
Random seed set to: 8316951
Removed 2 near-duplicate subgroups
Original rows: 5
After removal: 3
# of unique initial candidates: 3
# Restricting to top stop_Kgroups = 10
# of candidates to evaluate: 3
Algorithm: Two-stage sequential
Stage 1 splits: 30
Screen threshold: 0.763
Max total splits: 400
Batch size: 20
Parallel processing: callr with 6 workers
All subgroup evaluations returned NULL
Minutes forestsearch overall = 0.04
Consistency algorithm used: twostage
tau, maxdepth = 47.91247 2
leaf.node control.mean control.size control.se depth
2 3 -4.44 695.00 1.00 1
1 4 -4.77 568.00 1.11 2
GRF subgroup NOT found
=== Two-Stage Consistency Evaluation Enabled ===
Stage 1 screening splits: 30
Maximum total splits: 400
Batch size: 20
================================================
GRF stage for cut selection with dmin, tau = 12 0.6
tau, maxdepth = 49.5332 2
leaf.node control.mean control.size control.se depth
1 2 -4.24 689.00 1.07 1
2 4 4.78 83.00 3.36 2
3 5 -4.65 275.00 1.59 2
4 6 -6.00 336.00 1.55 2
GRF subgroup NOT found
GRF cuts identified: 0
# of continuous/categorical characteristics 4 3
Continuous characteristics: z2 z4 z5 size
Categorical characteristics: z1 z3 grade3
Default cuts included (1st 20) z2 <= mean(z2) z2 <= median(z2) z2 <= qlow(z2) z2 <= qhigh(z2) z4 <= mean(z4) z4 <= median(z4) z4 <= qlow(z4) z4 <= qhigh(z4) z5 <= mean(z5) z5 <= median(z5) z5 <= qlow(z5) z5 <= qhigh(z5) size <= mean(size) size <= median(size) size <= qlow(size) size <= qhigh(size)
Categorical: z1 z3 grade3
===== CONSOLIDATED CUT EVALUATION (IMPROVED) =====
Evaluating 17 cut expressions once and caching...
Cut evaluation summary:
Total cuts: 17
Valid cuts: 17
Errors: 0
✓ All 17 factors validated as 0/1
===== END CONSOLIDATED CUT EVALUATION =====
# of candidate subgroup factors= 17
[1] "z2 <= 2.5" "z2 <= 2" "z2 <= 3" "z4 <= 2.5" "z4 <= 3"
[6] "z4 <= 2" "z5 <= 2.5" "z5 <= 2" "z5 <= 1.8" "z5 <= 3"
[11] "size <= 29.4" "size <= 25" "size <= 20" "size <= 35" "z1"
[16] "z3" "grade3"
Number of possible configurations (<= maxk): maxk = 2 , # combinations = 595
Events criteria: control >= 10 , treatment >= 10
Subgroup search completed in 0.01 minutes
NO subgroup candidates found
tau, maxdepth = 49.5332 2
leaf.node control.mean control.size control.se depth
1 2 -4.24 689.00 1.07 1
2 4 4.78 83.00 3.36 2
3 5 -4.65 275.00 1.59 2
4 6 -6.00 336.00 1.55 2
GRF subgroup NOT found
=== Two-Stage Consistency Evaluation Enabled ===
Stage 1 screening splits: 30
Maximum total splits: 400
Batch size: 20
================================================
GRF stage for cut selection with dmin, tau = 12 0.6
tau, maxdepth = 46.7094 2
leaf.node control.mean control.size control.se depth
1 2 2.74 101.00 2.52 1
2 3 -4.65 599.00 1.06 1
3 4 3.72 91.00 2.46 2
4 5 -5.56 449.00 1.14 2
5 6 -6.60 105.00 2.94 2
GRF subgroup NOT found
GRF cuts identified: 0
# of continuous/categorical characteristics 4 3
Continuous characteristics: z2 z4 z5 size
Categorical characteristics: z1 z3 grade3
Default cuts included (1st 20) z2 <= mean(z2) z2 <= median(z2) z2 <= qlow(z2) z2 <= qhigh(z2) z4 <= mean(z4) z4 <= median(z4) z4 <= qlow(z4) z4 <= qhigh(z4) z5 <= mean(z5) z5 <= median(z5) z5 <= qlow(z5) z5 <= qhigh(z5) size <= mean(size) size <= median(size) size <= qlow(size) size <= qhigh(size)
Categorical: z1 z3 grade3
===== CONSOLIDATED CUT EVALUATION (IMPROVED) =====
Evaluating 18 cut expressions once and caching...
Cut evaluation summary:
Total cuts: 18
Valid cuts: 17
Errors: 0
Dropping variables (cut only has 1 level): z4 <= 4
Total cuts after dropping invalid: 17
✓ All 17 factors validated as 0/1
===== END CONSOLIDATED CUT EVALUATION =====
# of candidate subgroup factors= 17
[1] "z2 <= 2.4" "z2 <= 2" "z2 <= 1" "z2 <= 3" "z4 <= 2.5"
[6] "z4 <= 2" "z4 <= 1" "z5 <= 2.5" "z5 <= 2" "z5 <= 3"
[11] "size <= 29.3" "size <= 26" "size <= 20" "size <= 35" "z1"
[16] "z3" "grade3"
Number of possible configurations (<= maxk): maxk = 2 , # combinations = 595
Events criteria: control >= 10 , treatment >= 10
Subgroup search completed in 0.02 minutes
Found 2 subgroup candidate(s)
# of candidate subgroups (meeting all criteria) = 2
Random seed set to: 8316951
Removed 1 near-duplicate subgroups
Original rows: 2
After removal: 1
# of unique initial candidates: 1
# Restricting to top stop_Kgroups = 10
# of candidates to evaluate: 1
Algorithm: Two-stage sequential
Stage 1 splits: 30
Screen threshold: 0.763
Max total splits: 400
Batch size: 20
Parallel processing: callr with 6 workers
All subgroup evaluations returned NULL
Minutes forestsearch overall = 0.04
Consistency algorithm used: twostage
tau, maxdepth = 46.7094 2
leaf.node control.mean control.size control.se depth
1 2 2.74 101.00 2.52 1
2 3 -4.65 599.00 1.06 1
3 4 3.72 91.00 2.46 2
4 5 -5.56 449.00 1.14 2
5 6 -6.60 105.00 2.94 2
GRF subgroup NOT found
runtime_null <- difftime(Sys.time(), start_time, units = "mins")
cat("Completed in", round(runtime_null, 1), "minutes\n")Completed in 0.8 minutes
6 Summarizing Results
6.1 Operating Characteristics Summary
# Summarize alternative hypothesis results
summary_alt <- summarize_simulation_results(results_alt)
print(summary_alt) FS GRF
any.H 0.850 0.500
sensitivity 0.870 0.900
specificity 0.990 0.950
ppv 0.910 0.770
npv 0.980 0.980
Avg(#H) 82.000 103.000
minH 63.000 68.000
maxH 126.000 157.000
Avg(#Hc) 630.000 648.000
minHc 574.000 543.000
maxHc 700.000 700.000
hat(H*) 2.350 NaN
hat(hat[H]) 2.418 2.389
hat(Hc*) 0.661 NaN
hat(hat[Hc]) 0.666 0.649
hat(H*)all 2.350 NaN
hat(Hc*)all 0.661 NaN
hat(ITT) 0.762 0.762
hat(ITTadj) 0.772 0.772
# Summarize null hypothesis results
summary_null <- summarize_simulation_results(results_null)
print(summary_null) FS GRF
any.H 0.100 0.000
sensitivity NaN NaN
specificity 0.870 NaN
ppv 0.000 NaN
npv 1.000 NaN
Avg(#H) 90.000 NA
minH 81.000 NA
maxH 99.000 NA
Avg(#Hc) 691.000 700.000
minHc 601.000 700.000
maxHc 700.000 700.000
hat(H*) NaN NA
hat(hat[H]) 1.663 NA
hat(Hc*) 0.828 NA
hat(hat[Hc]) 0.749 NA
hat(H*)all NaN NaN
hat(Hc*)all 0.828 NaN
hat(ITT) 0.730 0.730
hat(ITTadj) 0.721 0.721
6.2 AHR Metrics in Results (New)
The aligned analysis functions now compute AHR estimates in addition to Cox-based HRs:
# Check for AHR columns in results
ahr_cols <- grep("ahr", names(results_alt), value = TRUE)
cat("AHR columns in results:", paste(ahr_cols, collapse = ", "), "\n\n")AHR columns in results:
if (length(ahr_cols) > 0) {
# Summarize AHR estimates
results_found <- results_alt[results_alt$any.H == 1, ]
if (nrow(results_found) > 0 && "ahr.H.hat" %in% names(results_found)) {
cat("AHR estimates (when subgroup found):\n")
cat(" Mean AHR(H) estimated:", round(mean(results_found$ahr.H.hat, na.rm = TRUE), 3), "\n")
cat(" Mean AHR(Hc) estimated:", round(mean(results_found$ahr.Hc.hat, na.rm = TRUE), 3), "\n")
cat(" True AHR(H):", round(dgm_calibrated$AHR_H_true, 3), "\n")
cat(" True AHR(Hc):", round(dgm_calibrated$AHR_Hc_true, 3), "\n")
}
}6.3 Formatted Tables
# Format operating characteristics for H1
format_oc_results(
results = results_alt,
title = "Operating Characteristics (Alternative Hypothesis)",
subtitle = sprintf("n = %d, %d simulations, HR(H) = %.2f",
sim_config_alt$n_sample,
sim_config_alt$n_sims,
dgm_calibrated$hr_H_true),
use_gt = TRUE
)| Operating Characteristics (Alternative Hypothesis) | ||||||||||||||
| n = 700, 20 simulations, HR(H) = 1.72 | ||||||||||||||
| Method | N Sims | Detection |
Classification
|
Hazard Ratios
|
Size_H_mean | Size_H_min | Size_H_max | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | PPV | NPV | HR_H_hat | HR_Hc_hat | HR_H_true | HR_Hc_true | HR_ITT | ||||||
| FS | 20 | 0.850 | 0.868 | 0.987 | 0.910 | 0.982 | 2.418 | 0.666 | 2.350 | 0.661 | 0.762 | 82 | 63 | 126 |
| GRF | 20 | 0.500 | 0.898 | 0.951 | 0.768 | 0.984 | 2.389 | 0.649 | - | - | 0.762 | 103 | 68 | 157 |
# Format operating characteristics for H0
format_oc_results(
results = results_null,
title = "Type I Error (Null Hypothesis)",
subtitle = sprintf("n = %d, %d simulations, HR(overall) = %.2f",
sim_config_null$n_sample,
sim_config_null$n_sims,
dgm_null$hr_causal),
use_gt = TRUE
)| Type I Error (Null Hypothesis) | ||||||||||||||
| n = 700, 20 simulations, HR(overall) = 0.75 | ||||||||||||||
| Method | N Sims | Detection |
Classification
|
Hazard Ratios
|
Size_H_mean | Size_H_min | Size_H_max | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | PPV | NPV | HR_H_hat | HR_Hc_hat | HR_H_true | HR_Hc_true | HR_ITT | ||||||
| FS | 20 | 0.100 | - | 0.871 | 0.000 | 1.000 | 1.663 | 0.749 | - | 0.828 | 0.730 | 90 | 81 | 99 |
| GRF | 20 | 0.000 | - | - | - | - | - | - | - | - | 0.730 | - | - | - |
6.4 Key Metrics
# Extract key metrics
cat("=== KEY OPERATING CHARACTERISTICS ===\n\n")=== KEY OPERATING CHARACTERISTICS ===
cat("Alternative Hypothesis (H1):\n")Alternative Hypothesis (H1):
for (analysis in unique(results_alt$analysis)) {
res <- results_alt[results_alt$analysis == analysis, ]
cat(sprintf(" %s: Power = %.3f, Sens = %.3f, Spec = %.3f, PPV = %.3f\n",
analysis,
mean(res$any.H),
mean(res$sensitivity, na.rm = TRUE),
mean(res$specificity, na.rm = TRUE),
mean(res$ppv, na.rm = TRUE)))
} FS: Power = 0.675, Sens = 0.879, Spec = 0.974, PPV = 0.857
GRF: Power = 0.675, Sens = 0.879, Spec = 0.974, PPV = 0.857
cat("\nNull Hypothesis (H0):\n")
Null Hypothesis (H0):
for (analysis in unique(results_null$analysis)) {
res <- results_null[results_null$analysis == analysis, ]
cat(sprintf(" %s: Type I Error = %.4f\n",
analysis,
mean(res$any.H)))
} FS: Type I Error = 0.0500
GRF: Type I Error = 0.0500
7 Advanced Topics
7.1 Comparing Standard vs Two-Stage Algorithm
# Run simulations with two-stage algorithm for comparison
results_twostage <- foreach(
sim = 1:100,
.combine = rbind,
.options.future = list(
packages = c("forestsearch", "survival", "data.table"),
seed = TRUE
)
) %dofuture% {
run_simulation_analysis(
sim_id = sim,
dgm = dgm_calibrated,
n_sample = sim_config_alt$n_sample,
confounders_base = confounders_base,
run_fs = TRUE,
run_fs_grf = FALSE,
run_grf = FALSE,
fs_params = fs_params_fast, # use_twostage = TRUE
verbose = FALSE
)
}
# Compare detection rates
cat("Standard algorithm power:", round(mean(results_alt$any.H[results_alt$analysis == "FS"]), 3), "\n")
cat("Two-stage algorithm power:", round(mean(results_twostage$any.H), 3), "\n")7.2 Adding Noise Variables
Test ForestSearch robustness by including irrelevant noise variables:
# Run simulations with noise variables
results_noise <- foreach(
sim = 1:sim_config_alt$n_sims,
.combine = rbind,
.errorhandling = "remove",
.options.future = list(
packages = c("forestsearch", "survival", "data.table"),
seed = TRUE
)
) %dofuture% {
run_simulation_analysis(
sim_id = sim,
dgm = dgm_calibrated,
n_sample = sim_config_alt$n_sample,
confounders_base = confounders_base,
n_add_noise = 10, # Add 10 noise variables
run_fs = TRUE,
fs_params = fs_params,
verbose = FALSE
)
}
# Compare detection rates
cat("Without noise:", round(mean(results_alt$any.H), 3), "\n")
cat("With 10 noise vars:", round(mean(results_noise$any.H), 3), "\n")7.3 Sensitivity Analysis: Varying Parameters
# Test different HR thresholds
thresholds <- c(1.10, 1.25, 1.50)
results_by_thresh <- list()
for (thresh in thresholds) {
results_by_thresh[[as.character(thresh)]] <- foreach(
sim = 1:100,
.combine = rbind,
.options.future = list(
packages = c("forestsearch", "survival", "data.table"),
seed = TRUE
)
) %dofuture% {
run_simulation_analysis(
sim_id = sim,
dgm = dgm_calibrated,
n_sample = sim_config_alt$n_sample,
confounders_base = confounders_base,
run_fs = TRUE,
fs_params = modifyList(fs_params, list(hr.threshold = thresh)),
verbose = FALSE
)
}
results_by_thresh[[as.character(thresh)]]$threshold <- thresh
}
# Combine and summarize
combined <- rbindlist(results_by_thresh)
combined[, .(power = mean(any.H), ppv = mean(ppv, na.rm = TRUE)),
by = .(threshold, analysis)]8 Saving Results
# Save simulation results for later use
save_simulation_results(
results = results_alt,
dgm = dgm_calibrated,
summary_table = summary_alt,
runtime_hours = as.numeric(runtime_alt) / 60,
output_file = "forestsearch_simulation_alt.Rdata",
# Include AHR metrics in saved output
ahr_metrics = list(
AHR_H_true = dgm_calibrated$AHR_H_true,
AHR_Hc_true = dgm_calibrated$AHR_Hc_true,
AHR = dgm_calibrated$AHR
)
)
save_simulation_results(
results = results_null,
dgm = dgm_null,
summary_table = summary_null,
runtime_hours = as.numeric(runtime_null) / 60,
output_file = "forestsearch_simulation_null.Rdata"
)9 Complete Example Script
Here’s a minimal self-contained script for running a simulation study:
# ===========================================================================
# Complete ForestSearch Simulation Study - Minimal Example (Aligned)
# ===========================================================================
library(forestsearch)
library(data.table)
library(survival)
library(foreach)
library(doFuture)
# --- Configuration ---
N_SIMS <- 500
N_SAMPLE <- 500
TARGET_HR_HARM <- 1.5
# --- Setup parallel processing ---
plan(multisession, workers = 4)
registerDoFuture()
# --- Create DGM ---
# Option 1: Calibrate to Cox-based HR
k_inter <- calibrate_k_inter(target_hr_harm = TARGET_HR_HARM,
use_ahr = FALSE, verbose = TRUE)
# Option 2: Calibrate to AHR instead
# k_inter <- calibrate_k_inter(target_hr_harm = TARGET_HR_HARM,
# use_ahr = TRUE, verbose = TRUE)
dgm <- create_gbsg_dgm(model = "alt", k_inter = k_inter, verbose = TRUE)
# Verify hazard ratios (new aligned output)
cat("\nDGM Summary:\n")
cat(" Cox HR(H):", round(dgm$hr_H_true, 3), "\n")
cat(" AHR(H):", round(dgm$AHR_H_true, 3), "\n")
cat(" Cox HR(Hc):", round(dgm$hr_Hc_true, 3), "\n")
cat(" AHR(Hc):", round(dgm$AHR_Hc_true, 3), "\n")
# --- Run simulations ---
confounders <- c("v1", "v2", "v3", "v4", "v5", "v6", "v7")
results <- foreach(
sim = 1:N_SIMS,
.combine = rbind,
.options.future = list(
packages = c("forestsearch", "survival", "data.table"),
seed = TRUE
)
) %dofuture% {
run_simulation_analysis(
sim_id = sim,
dgm = dgm,
n_sample = N_SAMPLE,
max_follow = 60,
confounders_base = confounders,
run_fs = TRUE,
run_fs_grf = TRUE,
run_grf = FALSE,
fs_params = list(
hr.threshold = 1.25,
fs.splits = 300,
maxk = 2,
use_twostage = FALSE # Set TRUE for faster analysis
)
)
}
# --- Summarize ---
summary_table <- summarize_simulation_results(results)
print(summary_table)
# --- Display formatted table ---
format_oc_results(results = results, title = sprintf("Operating Characteristics (n=%d)", N_SAMPLE))
# --- Report AHR metrics (new) ---
results_found <- results[results$any.H == 1, ]
if (nrow(results_found) > 0 && "ahr.H.hat" %in% names(results_found)) {
cat("\nAHR Estimates:\n")
cat(" True AHR(H):", round(dgm$AHR_H_true, 3), "\n")
cat(" Mean estimated AHR(H):", round(mean(results_found$ahr.H.hat, na.rm = TRUE), 3), "\n")
}10 Summary
This vignette demonstrated the complete workflow for evaluating ForestSearch performance through simulation:
| Step | Function | Purpose |
|---|---|---|
| 1. Create DGM | create_gbsg_dgm() |
Define data generating mechanism |
| 2. Calibrate | calibrate_k_inter() |
Achieve target subgroup HR (Cox or AHR) |
| 3. Validate | validate_k_inter_effect() |
Verify heterogeneity control |
| 4. Simulate | simulate_from_gbsg_dgm() |
Generate trial data |
| 5. Analyze | run_simulation_analysis() |
Run ForestSearch/GRF |
| 6. Summarize | summarize_simulation_results() |
Aggregate metrics |
| 7. Display | format_oc_results() |
Create gt tables |
Key metrics to report:
- Power (H1) / Type I Error (H0): Subgroup detection rate
- Sensitivity: P(identified | true harm subgroup)
- Specificity: P(not identified | true complement)
- PPV: P(true harm | identified)
- NPV: P(true complement | not identified)
New aligned features:
- AHR metrics: Alternative to Cox-based HR (from
loghr_po) use_ahrcalibration: Calibrate to AHR instead of Cox HRuse_twostage: Faster two-stage search algorithm option- Individual effects: Access
theta_0,theta_1,loghr_poper subject
11 Session Info
sessionInfo()R version 4.5.1 (2025-06-13)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Los_Angeles
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] doFuture_1.2.0 future_1.69.0 foreach_1.5.2
[4] gt_1.3.0 ggplot2_4.0.1 survival_3.8-6
[7] data.table_1.18.0 weightedsurv_0.1.0 forestsearch_0.0.0.9000
loaded via a namespace (and not attached):
[1] sass_0.4.10 generics_0.1.4 xml2_1.4.0
[4] shape_1.4.6.1 stringi_1.8.7 lattice_0.22-7
[7] listenv_0.9.1 digest_0.6.37 magrittr_2.0.4
[10] grf_2.5.0 evaluate_1.0.5 grid_4.5.1
[13] RColorBrewer_1.1-3 iterators_1.0.14 policytree_1.2.3
[16] fastmap_1.2.0 jsonlite_2.0.0 Matrix_1.7-3
[19] glmnet_4.1-10 processx_3.8.6 ps_1.9.1
[22] scales_1.4.0 codetools_0.2-20 cli_3.6.5
[25] rlang_1.1.6 parallelly_1.46.1 future.apply_1.20.1
[28] splines_4.5.1 withr_3.0.2 yaml_2.3.10
[31] tools_4.5.1 parallel_4.5.1 dplyr_1.1.4
[34] globals_0.18.0 vctrs_0.6.5 R6_2.6.1
[37] lifecycle_1.0.4 stringr_1.6.0 randomForest_4.7-1.2
[40] fs_1.6.6 htmlwidgets_1.6.4 callr_3.7.6
[43] pkgconfig_2.0.3 progressr_0.18.0 pillar_1.11.1
[46] gtable_0.3.6 glue_1.8.0 Rcpp_1.1.0
[49] xfun_0.53 tibble_3.3.0 tidyselect_1.2.1
[52] future.callr_0.10.2 rstudioapi_0.17.1 knitr_1.51
[55] farver_2.1.2 htmltools_0.5.8.1 rmarkdown_2.30
[58] compiler_4.5.1 S7_0.2.0
12 References
León LF, Marceau-West CT, He W, et al. (2024). “Identifying Patient Subgroups with Differential Treatment Effects: A Forest Search Approach.” Statistics in Medicine.
Athey S, Imbens GW. (2016). “Recursive partitioning for heterogeneous causal effects.” PNAS, 113(27):7353-7360.
Wager S, Athey S. (2018). “Estimation and inference of heterogeneous treatment effects using random forests.” JASA, 113(523):1228-1242.